opic ^MMM/f cipo 

Office uh la propriete jtfK/gmmmmS Canadian Intellectual 

1NTEL1.ECTUELLE DU CaNADA ^ PROPERTY OfPICE 

Oa awa HullKlA0C9 (21) (Al) 2,108,113 

(22) 1993/10/08 

BEST AVAILABLE COPY <«> 1995/04/09 



(51) INTL.CL. 5 C12N-015/52; C12N-015/76; C12N-009/00; C12P-017/18 
C12N-001/21 



(19) (CA) APPLICATION FOR CANADIAN PATENT (12) 



(54) DNA Sequence Encoding Enzymes of Clavulanic Acid > # 

Biosynthesis 



10 



(72) Jensen, Susan E. - Canada ; 
Aidoo, Kwamena A. - Canada ; 
Paradkar, Ashish S. - Canada ; 

(71) Governors of the University of Alberta (The) - Canada ; 
(57) 39 Claims 



Noticet Thie application is as filed and may therefore contain an 
incomplete specification. 



3488 



Industrie Canada Industry Canada 



Canada 



24 



ABSTRACT 

DNA sequences are provided which encode the enzymes 
required for clavulanic acid synthesis* A process is 
5 provided for producing clavulanic acid in a transf ormant 
of a non-clavulanate-producing host* 
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DNA SEQUENCE ENCODING ENZYMES O F CLAVULANIC ACID 
BIOSYNTHESIS 

This invention relates to methods for the production 
of the antibiotic, clavulanic acid. 

Background of th e Invention 

Clavulanic acid is a broad spectrum beta- lactamase 
inhibitor and is an important antibiotic for the 
treatment of infectious diseases. It is produced 
commercially by the gram-positive mycelial prokaryote 
stre ptomvces clavuligerus , which also produces the /?- 
lactam antibiotics penicillin N, desacetoxy 
cephalosphorin C and cephamycin C. Until recently, 
however, the pathway employed for clavulanic acid 
biosynthesis was much less well understood than the 
pathways leading to these other antibiotics. 

Without knowledge of the pathway for clavulanic acid 
biosynthesis, it was not possible to isolate the genes 
coding for the key enzymes and to manipulate these genes 
to increase antibiotic yield or permit production of the 
antibiotic in heterologous systems. 

One of the earliest enzymes of the pathway to be 
purified and characterised was clavaminic acid synthase. 
Two isozymes have now been identified and characterised 
(Marsh et al., (1992), Biochem. , vol. 31, pp. 12648-657). 

European Patent Application 0349121 describes a DNA 
restriction fragment encoding a portion of the genetic 
information involved in clavulanic acid synthesis but 
provides no sequence information. 

Until the work of the present inventors, the 
complete complement of genes required for clavulanic acid 
synthesis had not been identified. The present inventors 
have now isolated, cloned and sequenced an 11.6 kb 
genomic DNA sequence from s. clavuligerus which codes for 
eight proteins and enables the production of clavulanic 
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acid by transformants of non-clavulanic-producing 
organisms . 



. ^irnmarv of the In vention 
5 An isolated genomic DNA molecule is provided 

comprising the nucleotide sequence set out in Figure 2. 
A process is provided for producing clavulanic acid in a 
transformant of a non-clavulanate-producing host. 

10 Description of Drawings 

The invention, as exemplified by a preferred 
embodiment, is described with reference to the 
accompanying drawings in which: 

Figure 1 shows the N terminal amino acid sequence of 
15 CLA and the nucleotide sequence of a probe (Sequence ID 
No.: 2) directed to the underlined region of the sequence. 

Figure 2 (2-1 to 2-10) shows the nucleotide sequence 
(Sequence ID No.:l) of a 15 kb genomic DNA fragment from 
c mavulioerus . The sequences of the ten ORFs within 
20 the fragment are shown in upper case letters and the 

intergenic regions are shown in lower case letters. The 
locations of the beginning and end of each ORF are also 
indicated directly above the nucleotide sequence. 
Asterisks above the sequence indicate the EcoRl sites 
25 which mark the beginning and end of the portion of the 
DNA sequence which contains all the genetic information 
for clavulanic acid synthesis. 

Figure 3 shows the location of the open reading 
frames downstream from pcbc. 
30 Figure 4 shows a partial restriction map of the DNA 

sequence of Figure 2 in the region surrounding cla 
(0RF4) . 

Figure 5 shows a shuttle vector used for disruption 

of the cla gene. 
35 Figure 6 shows a photograph of an agar plate bearing 

cultures of S. 1 ividans transformants. 
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Figure 7 shows an alignment of the amino acid 
sequence of CLA ( S, clavuligerus CLA) with those of E_=. 
Coli agmatine ureohydrolase ( E. Coli AUH) , yeast 
arginase (yeast ARG) , rat arginase (rat ARG) and human 
arginase (human ARG) . 

Figure 8 shows a Southern blot of Ncol digests of 
genomic DNA from five presumptive mutants (lanes 1-5) an 
from wild-type S. clavuligerus (lane 6) . Panel A : 
membranes probed with cla-specific probe. Panel B : 
membranes probed with tsr-specific probe. 

Figure 9 shows restriction enzyme maps of 
clavuligerus DNA inserts in cosmids. A. Restriction 
enzyme map of cosmid K6L2 . B. Partial restriction 
enzyme map of cosmid K8L2 . C. Restriction map of 
cosmids K6L2 and K8L2 indicating location of pcbC gene i 
relation to cla. D. The 2.0 kb Nco l fragment 
encompassing the cla gene used in generating nested 
deletions for sequencing. Abbreviations: Ba, BamHI; 
B,BglII; E,EcoRl; K, Kpnl ; N, Ncol; S,SalI; and Sm,SmaIv 

Figure 10 shows the deduced amino acid sequence 
(Sequence ID No. : 3) of ORF1 of Figure 2. 

Figure 11 shows the deduced amino acid sequence 
(Sequence ID No. : 4) of ORF2 of Figure 2. 

Figure 12 shows the deduced amino acid sequence 
(Sequence ID No.: 5) of ORF3 of Figure 2. 

Figure 13 shows the deduced amino acid sequence 
(Sequence ID No.: 6) of ORF4 of Figure 2. 

Figure 14 shows the deduced amino acid sequence 
(Sequence ID No.: 7) of ORF5 of Figure 2. 

Figure 15 shows the deduced amino acid sequence 
(Sequence ID No.: 8) of ORF6 of Figure 2. 

Figure 16 shows the deduced amino acid sequence 
(Sequence ID No.: 9) of ORF7 of Figure 2. 

Figure 17 shows the deduced amino acid sequence 
(Sequence ID No.: 10) of ORF8 of Figure 2. 

Figure 18 shows the deduced amino acid sequence 
(Sequence ID No.: 11) of ORF9 of Figure 2. 
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Figure 19 shows the deduced amino acid sequence 
(Sequence ID No.: 12) of ORF10 of Figure 2. 

retailed description of the Invention 

Production of penicillin and cephamycin antibiotics 
in «- .-i a vul icterus starts with the conversion of lysine 
to a-aminoadipic acid (Madduri et al., (1989), J. 
Bacterid., v. 171, pp. 299-302; (1991), J. Bacterid., 
v. 173, PP. 985-988). a-Aminoadipic acid then condenses 
with cysteine and valine to give S- (L-ot-aminoadipyl) -L- 
cysteinyl-D-valine (ACV) by the action of aminoadipyl- 
cysteinyl-valine synthetase (ACVS) . ACV is converted by 
isopenicillin N synthase (IPNS) to isopenicillin N, and, 
through a series of reactions, to desacetoxycephalosporxn 
C and ultimately to cephamycin C (Jensen et al., (1984), 
Appl. Microbiol. Biotechnol. , v. 20, pp 155-160). 

The ACVS of s- rAavul icterus has been purified and 
partially characterized by three separate groups, and 
estimates of its molecular weight vary from 350,000 to 
0 500,000 Da (Jensen et al., (1990) J. Bacterid., v. 172, 
pp. 7269-7271; Schwecke et al., (1992), Eur. J. Biochem. , 
v. 205, pp. 687-694; Zhang and Demain, (1990), Biotech 
Lett., v. 12, pp. 649-654). During their purification, 
Jensen et al. observed a 32,000 Da protein which co- 
5 purified with ACVS despite procedures which should remove 
small molecular weight components. It has now been found 
that this protein is not related to ACVS but rather to 
clavulanic acid biosynthesis. It has been designated 
CLA. 

3 0 In accordance with one embodiment of the invent ion, 

the present inventors have identified, cloned and 
sequenced the gene (cla) encoding this protein. 

In accordance with a further embodiment of the 
invention, the inventors have cloned and sequenced a 15 
35 kb stretch of genomic DNA from s. olavuligerus which 

includes the cla gene. Within this 15 kb sequence, the 
inventors have identified an 11.6 kb DNA fragment which, 
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when introduced into the non-clavulanate producer £U 
lividans as described in Example 4, enabled that species 
to produce clavulanic acid. This indicates that the 11.6 
kb fragment contains all the genetic information required 
5 for clavulanate production. 

As will be understood by those skilled in the art, 
the identification of the DNA sequence encoding the 
enzymes required for clavulanate synthesis will permit 
genetic manipulations to modify or enhance clavulanate 
10 production. For example, clavulanate production by 

clavuliaerus may be modified by introduction of extra 
copies of the gene or genes for rate limiting enzymes or 
by alteration of the regulatory components controlling 
expression of the genes for the clavulanate pathway. 
15 Heterologous organisms which do not normally 

produce clavulanate may also be enabled to produce 
clavulanate by introduction, for example, of the 11.6 kb 
DNA sequence of the invention by techniques which are 
well known in the art, as exemplified herein by the 
2 0 production of S. lividans strains capable of clavulanate 
synthesis. Such heterologous production of clavulanic 
acid provides a means of producing clavulanic acid free 
of other contaminating clavams which are produced by 
clavuliaerus. 

2 5 Suitable vectors and hosts will be known to those 

skilled in the art; suitable vectors include pIJ702, 
pJOE829 and pIJ922 and suitable hosts include SL. 
lividans , S. oarvulus . S. ariseof ulvus . S. antibioticus 
and S. liomanii * 
30 Additionally, the DNA sequences of the invention 

enable the production of one or more of the enzymes of 
the clavulanate pathway by expression of the relevant 
gene or genes in a heterologous expression system. 

The DNA sequences coding for one or more of the 

3 5 pathway enzymes may be introduced into suitable vectors 

and hosts by conventional techniques known to those 
skilled in the art. Suitable vectors include pUC118/119 
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and pET-11 and suitable hosts include many organisms, 
including E. coli strains such as MV1193 and BL21(DE3) . 

An oligonucleotide probe based on the N-terminal 
amino acid sequence of CLA was constructed as shown in 
5 Figure 1 and was used to isolate the gene coding for the 
protein from S. clavuliaerus . as described in Example 1. 

The gene was found to be located in the SL_ 
clavuliaerus chromosome about 5.7 kb downstream of pcbC, 
the gene which encodes isopenicillin N synthase. The 
10 gene contains a 93 3 bp open reading frame (ORF) , encoding 
a protein of molecular weight 33,368. The deduced amino 
acid sequence was compared to database sequences and 
showed greatest similarity to enzymes associated with 
arginine metabolism, notably agmatine, ureohydrolase and 
15 arginases. 

When an internal fragment of the cla gene was 
labelled and used to probe restriction endonuclease 
digests of genomic DNA from a variety of other 
Streptomvces and related species, evidence of homologous 
2 0 sequences was seen only in other clavulanic acid or 
clavam metabolite producers, including Streptomvces 
iumoni inensis , Streptomyces lipmanii (7) and Streptomyces 
ant ibioticus . No cross reactivity was seen to the 
lactam producing species Nocardia lactamdurans, 
25 Streptomvces griseus or Streptomyces cattleva , nor to any 
of a variety of other Streptomvces species which do not 
produce /3-lactam compounds, including S. fradiae ATCC 
19609, S. venezuelae 13s and S . ariseof ulvus NRRL B-5429. 
Disruption of the cla gene, as described in Example 
30 3, led to loss of the ability to synthesise clavulanic 
acid . 

A 15 kb DNA sequence extending downstream from pcb C 
was cloned and sequenced as described in Example 5, The 
nucleotide sequence is shown in Figure 2 . When this 
35 sequence information was analysed for percent G + C as a 
function of codon position (Bibb et al. , (1984), Gene, v. 
30, pp. 157-166), ten complete ORFs were evident, as 
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shown in Figure 3. ORF 4 corresponds to cla. ORF 1,7 & 
8 are oriented in the opposite direction to pcbC. ORFs 
2-6 and ORF 10 are all oriented in the same direction as 
pcbC. ORFs 2 and 3, and ORFs 4 and 5 are separated by 
very short intergenic regions suggesting the possibility 
of transcriptional and translational coupling. Table 1 
summarises the nucleotide sequences and lengths of ORFs , 
1-10. 

When the predicted amino acid sequences of proteins 
encoded by ORFs 1-10 were compared to protein sequence 
databases, some similarities were noted in addition to 
the already mentioned similarity between CLA and enzymes 
of arginine metabolism* ORF l showed a low 
level of similarity to penicillin binding proteins from 
several different microorganisms which are notable for 
their resistance to 0-lactam compounds. 

An EcoRI fragment of the 15 kb DNA sequence, 
containing 11.6 kb DNA, was cloned into a high copy 
number shuttle vector and introduced into S. lividans, as 
described in Example 4. Of seventeen transf ormants 
examined, two were able to produce clavulanic acid, 
indicating that the 11.6 kb fragment contains all the 
necessary genetic information for clavulanic acid 
production. 

This 11.6 kb fragment encompasses ORF 2 to ORF 9 of 
the 15 kb DNA sequence. 

ORF 2 shows a high degree of similarity to 
ace tohydroxy acid synthase (AHAS) enzymes from various 
sources. AHAS catalyses an essential step in the 
biosynthesis of branched chain amino acids. Since valine 
is a precursor of penicillin and cephamycin antibiotics, 
and valine production is often subject to feedback 
regulation, it is possible that a deregulated form of 
AHAS is produced to provide valine during the antibiotic 
production phase. Alternatively, an AHAS-like activity 
may be involved in clavulanic acid production. While the 
presently recognized intermediates in the clavulanic acid 
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biosynthetic pathway do not indicate a role for AHAS, the 
final step in the biosynthetic pathway, conversion of 
clavaminic acid to clavulanic acid, requires NADPH, and 
either pyruvate or a-ketobutyrate as well as other 
cofactors (Elson et al., (1987), J. Chem. Soc. Chem. 
Cororoun. , pp. 1739-1740). It is striking that these same 
substrates and cofactors are required for AHAS activity. 
Perhaps the conversion of clavaminate to clavulanate 
actually involves several steps, one of which is 
catalyzed by an AHAS-like activity. ORFs 3 and 5 do not 
show a significant similarity to any proteins in the data 
bases. ORF 6 shows similarity to ornithine 
acetyltransf erase. Ornithine has been suggested to be 
the immediate precursor of a 5-C fragment of the 
clavulanic acid skeleton, but the details of the reaction 
required for the incorporation of ornithine are unknown. 
ORF 7 shows weak similarity to protein XP55 from S^. 
lividans . and a lower level of similarity to oligopeptide 
binding proteins from various other species. Similarly, 
ORF 8 shows weak similarity to several transcription 
activator proteins, and ORF 9 shows weak similarity to 
ribitol 5 P0 4 dehydrogenase-type enzymes. ORF 10 shows a 
high similarity to cytochrome P450 type enzymes from 
other strepomvces species. 
25 ORF5 has now been identified as the gene for 

clavaminate synthase II (Marsh (1993) supra ) . 

When a plasmid isolated from one of the two 
clavulanic acid-producing transformants was retransf ormed 
into S - lividans , about 40-45% of the resulting colonies 
30 were able to produce clavulanic acid, as shown in Figure 
6. 
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EXAMPLES 

Example 1 

Bacterial strains, vectors an d growth conditions. 

Streptomvces nlavuliaerus NRRL 3585, Streptomyces 
iumoniinenisis NRRL 5741, Streptomvces lipmanii 
NRRL 3584, Streptomvces oriseus NRRL 3851, Nocardia 
lactamdurans NRRL 3802 and Streptomvces cattleya NRRL 
3841 were provided by the Northern Regional Research 
Laboratories, Peoria, II. Streptomyces antibioticus ATCC 
8663 and streptomvces fradiae ATCC 19609 were obtained 
from the American Type Culture Collection, Rockville, MD. 
streptomvces lividans strains 1326 and TK24 were provided 
by D. A. Hopwood (John Innes Institute, Norwich, U.K.), 
streptomvces venezuelae 13s and streptomvces griseofuscus 
NRRL B-5429 were obtained from L.C. Vining (Department of 
Biology, Dalhousie University, Halifax, N.S.). Cultures 
were maintained on either MYM (Stuttard (1982) J. Gen. 
Microbiol., v. 128, pp. 115-121) or on a modified R5 
medium (Hopwood et al. (1985) in "Genetic Manipulation of 
streptomvces : a laboratory manual" , John Innes 
Foundation, U.K. ) containing maltose instead of glucose 
and lacking sucrose (R5-S) . Escherichia coli MV1193 
(Zoller and Smith (1987) Methods in Enzymology, v. 154, 
pp. 329-349), used as recipient for all of the cloning 
and subcloning experiments, was grown in Luria Broth (LB; 
Sambrook et al. (1989) in "Molecular Cloning : a 
laboratory manual", Cold Spring Harbour, N.Y.) or on LB 
agar (1.5%) plates containing ampicillin (50 nq/riL) or 
tetracycline (10 ng/mL) . The cloning vectors pUC118 and 
pUC119 (Vieira and Messing (1987) Methods in Enzymology, 
v. 153, pp. 3-11) were provided by J. vieira (Waksman 
Institute of Microbiology, Rutgers University, 
Piscataway, N,J.). The plasmid vector pJOE829 was 
generously provided by J. Altenbuchner (University of 
Stuttgart, Stuttgart, Germany). The plasmid pIJ702 was 
obtained from the American Type Culture Collection, 
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Rockville, MD. Restriction enzymes were purchased from 
Boehringer Mannheim, and used according to the 
manufacturers' specifications. 

Reparation of C T-& from ACVS 

CLA was previously characterized as a 32,000 Da 
molecular weight protein present in preparations of 
highly purified ACVS (Jensen et al. (1990), supra). The 
small size of CIA suggested that its co-purification wxth 
ACVS resulted from a physical association between the two 
proteins . 

ACVS and CLA were resolved by applying a 0.2 ml 
sample of purified ACVS containing CLA onto a Superose 6 
HR 10/30 (Pharmacia) , which was equilibrated and eluted 
in 0.1 M MOPS buffer, P H 7.5 containing 0.05 M KCl, X *M 
dithiothreitol, and 20% glycerol, at a flow rate of 0.25 

Comparison of the CLA retention time with those of 
molecular weight standards indicated that the native 
molecular weight of CLA was in excess of 270 kDa. The 
difference in molecular weight between native and 
denatured forms of CLA suggests that the native protein 
exists as an oligomer of eight identical subunits. 
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25 l5Qla*i°" of ^ enf> < c1a) f ° r CL - . , rtr . CIA 

N-terminal amino acid sequence information for CLA 

was obtained by electrophoretically transferring the 
protein from SDS polyacrylamide gels onto Immobilon 
membranes (Millipore Ltd., ) and submitting the material 
to the Protein Microsequencing Laboratory (University of 
Victoria,) for analysis. Information obtained for 25 
amino acids at the N-terminus was used to prepare a 24- 
mer oligonucleotide probe with 8-fold degeneracy to the 
amino acid sequence underlined in Figure 1. The ammo 
acids in brackets indicate ambiguities in the N-terminal 
sequence. The actual DNA sequence from the cloned 
fragment is indicated in Figure 1. 
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The probe was designed as an 8-fold degenerate 
mixture of oligonucleotides to take into consideration 
the biased codon usage of streotomvces (Bibb et al., 
1984, Wright and Bibb (1992), Gene, v. 113, pp. 55-65).)- 
5 End-labelled probe was then used to screen a cosmid 
library of S. clavuliaerus genomic DNA fragments as 
described in Materials and Methods. 

A library of s. clavuliaerus genomic DNA fragments 
(15-22 Kb size fractionated fragments) was constructed as 
10 previously described (Doran et al. (1990), J. Bacteriol., 
v. 172, pp. 4909-4918). using the cosmid vector pLAFR3 . 
A collection of 1084 isolated E. coli colonies containing 
recombinant cosmids was screened for the presence of cla 
using the 24-mer mixed oligonucleotide probe (Fig. 1) 
15 which had been end-labelled with [ 7 - J2 P]dATP and 

polynucleotide kinase (Boehringer Mannheim) . Colony 
hybridization and subsequent washing was performed as 
described by Sambrook et al., (1989), at 55°C with a 
final wash in 0.2X SSC (IX SSC, 0.15M NaCl and 0.0 15M 
20 sodium citrate) and 0.1% SDS. 

Five colonies which gave strong hybridization 
signals were isolated from the panel of 1084 clones, and 
restriction analysis showed that the positive clones 
contained overlapping fragments of DNA. Two clones, K6L2 
25 and K8L2 , with sequences that spanned about 40 kb of the 
s. clavuliaerus genome, were chosen for further analysis. 
Clone K8L2 contained about 22 kb of s, r.T avul igerus 
genomic DNA and included a portion of cla and all of the 
pcb C gene which encodes IPNS in the penicillin/cephamycin 
30 biosynthetic pathway. A restriction map of K6L2 is shown 
in Fig. 9. Within the approximately 27 kb of DNA 
contained in K6L2 , the oligonucleotide probe hybridized 
to a 2.0 kb Nco l fragment which was subsequently found to 
contain the entire cla gene. Hybridization studies, 
35 restriction mapping and DNA sequence analysis revealed 

that cla was situated 5.67 kb downstream of the pcbC gene 
of s. clavuliaerus (Fig. 9). 
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DNA sequencing and analysis 

Ordered sets of deletions were generated (Henikoff , 
1984) extending across the cla region of the 2.0 kb NspJ 
fragment (Fig. 9C) . The deletion generated fragments 
were sequenced in both orientations by the 
dideoxynucleotide chain termination method of (Sanger et 
al. (1977), P.N.A.S., v. 74, pp. 5463-5467) using 
Sequenase (version 2.0) DNA polymerase (United States 
Biochemical Corporation) . Areas of compression in the 
sequence band pattern were relieved by carrying out 
reactions using 7-deaza-dGTP in place of dGTP. The 
nested deletion fragments resided either in pUC118 or 
pUC119, and were sequenced using the commercially 
available universal primers (Vieira and Messing, 1987). 

The nucleotide sequence data were analyzed for the 
presence of restriction sites, open reading frames (ORFs) 
and codon usage by the PC-Gene programme ( Intel ligenetics 
Corp.). Similarity searches were accomplished with the 
FASTA program searching the GenPept database (release 
number 71) available through GenBank (Pearson and Lipman 
(1988), P.N.A.S., V. 85, pp. 2444-2448). 

An ORF of 939 bp with a potential ribosome site 9 bp 
from the GTG start codon was found which encoded a 
putative protein with a molecular weight of 33,368 Da. 
5 This value is in close agreement to the molecular weight 
estimated for CLA by SDS-PAGE (Jensen et al., 1990). The 
analysis of percent G + C as a function of codon position 
(FRAME analysis), using the algorithm of Bibb et al., 
(1984), indicated the presence of a typical streptomycete 
0 ORF (data not shown) with a G + C content of 70%. 

Computer aided data base searches for sequences similar 
to cla revealed a high degree of similarity to agmatine 
ureohydrolase (40.5% identity over 291 amino acids) and 
somewhat lower similarity to arginases (29.6% identity 
5 over 135 amino acids to arginases from yeast and rat) as 
shown in Figure 7. The s. riavuliaerus CLA sequence was 
aligned with the e. coli AUH sequence by the FASTA 
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program described above. The AUH sequence had previously 
been aligned with the three ARG sequences (Szumanski & 
Boyle (1990), J. Bacteriol., v. 172, pp. 538-547). 
Identical matches in two or more sequences are indicated 
5 with upper case letters. 



Example 2 

DNA hybridization 

Genomic DNA preparations from various Streptomyces 
10 species were isolated as described by Hopwood et al. 

(1985). For interspecies DNA hybridization analysis, 2.0 
Mg amounts of genomic DNA preparations were digested with 
Ncol for 16h, and electrophoresed in 1.0% agarose gels. 
The separated DNA fragments were then transferred onto 
15 nylon membranes (Hybond-N, Amersham) and hybridized with 
a cla specific probe prepared by labeling an internal 459 
bp Sai l fragment (Fig. 1) with [a- 32 P]dATP by nick 
translation. Hybridization was done as described by 
Sambrook et al., (1989). Hybridization membranes were 
washed twice for 30 min in 2X SSC; 0.1% SDS and once for 
30 min in 0.1X SSC; 0.1% SDS at 65°C. 
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se quences homologous to cla in ot her Streptomycetes 
Three of six producers of /8-lactam antibiotics, 
25 clavuliqerus , S. lipmanii and S- numoni inensis showed 

positive hybridization signals whereas s. cattleya, S_s. 

ariseus , and N. lactamdurans did not (data not shown) . 

None of the nonproducing strains examined, S. venezuelae, 

s. lividans , S. fradiae. S. antibioticus and 
30 ariseofuscus gave any signal. All of the streptomycetes 

that gave positive signals were producers of clavam-type 

metabolites (Elson et al., 1987) 



Example 3 

35 Disruption of the genomic cla gene 

A 2.0 kb Ncol fragment that contained the entire cla 
gene was digested at its unique Kpnl site and the ends 



2108113 



14 



made blunt by treatment with the Klenow fragment of IU. 
coli DNA polymerase I. A thiostrepton resistance gene 
(jtsr) , isolated as a 1085 bp Sell fragment from pIJ702 
and cloned into the BamHI site of pUC118 was excised as a 
Smal/Xbal fragment and the ends made blunt as above and 
ligated into the KeqI site of cla. The ligation mixture 
was introduced into E. coli MV1193 and the transf ormants 
screened for the presence of the tsr. gene by colony 
hybridization (Sambrook et al., 1989). 

Replacement of the chromosomal cla gene by a copy 
disrupted by the insertion of tsr, at an internal KehI 
site, was achieved by double recombination. Successful 
gene replacement was apparent when the 2.0 kb M£Oj 
fragment which carries cla in the wild type organism was 
replaced by a 3.0 kb Ncol fragment due to the insertion 
of the 1.0 kb tsr gene in the mutants. Four of the five 
mutants tested showed the expected increase in the size 
of the Ncol fragments, and the larger Ncol fragments also 
hybridized with a tsr specific probe. The fifth mutant 
0 was apparently a spontaneous theostrepton resistant 
mutant . 

Antibiotic Assay 

The agar diffusion assay was used for determining 
5 both penicillin/ cephamycin and clavulanic acid 

production. «- *v»n iaerus strains to be assayed were 
grown in 10 ml. amounts of Trypticase Soy Broth (TSB; 
Baltimore Biological Laboratories) medium with 1.0% 
starch for 48h. The cultures were washed twice with 
30 10.3% sucrose and once with MM (Jensen et al. (1982), J. 
Antibiot., v. 35 , pp. 483-490) and the mycelium 
resuspended in 10.0 roL of MM. Two millilitres of washed 
cell suspension was inoculated into 100 mL of MM and 
incubated at 28*C for 48h. The cultures were harvested 
35 by centrifugation, and the supernatants were assayed for 
both penicillin/cephamycin and clavulanic acid using 
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bioassay procedures described previously (Jensen et al. 
(1982) , supra) . 

All of the resulting colonies with disrupted cla 
genes grew equally well on minimal medium and complex 
media and produced as much penicillin and cephamycin as 
did the wild-type , but produced no clavulanic acid (data 
not shown) . HPLC analysis of cell supematants confirmed 
the inability of the disrupted cla mutants to synthesize 
any clavulanic acid (data not shown) . 



Example 4 

Protoplast formation and tra nsformation 

E. coli competent cell preparation and 
transformation were as described by Sambrook et al., 
15 (1989). Protoplasts of s. clavu liaerus were, prepared, 

transformed and regenerated as described by Bailey et al. 
(1984), Bio /Techno logy, v. 2, pp. 808-811, with the 
following modifications. Dextrin and arginine in the 
regeneration medium were replaced by starch and sodium 
20 glutamate respectively. Protoplasts were heat shocked at 
43 °C for 5 min prior to the addition of DNA. Standard 
procedures were used for protoplasting and transformation 
of S. lividans (Hopwood et al. (1985)). 

The 11.6 kb Eco Rl fragment from K6L2 (Fig. 9) was 
25 cloned into the EcoRl site of pCAT-119 . pCAT-119 is 

derivative of pUCH9 which was prepared by insertionally 
inactivating the ampicillin resistance gene of pUC119 by 
the insertion of a chloramphenicol acetyltransf erase gene 
(Jensen et al. (1989), Genetics & Molec. Biol, of Ind. 
30 Microorg., pp. 239-245 Ed. Hershberger, Amer. Soc. 

Microbiol). The PCAT-119 plasmid carrying the 11.6 kb 
fragment was then digested with PstI and ligated to the 
fityentomvces plasmid pIJ702, which had also been digested 
with PstI. The resulting bi functional plasmid carrying 
35 the 11.6kb insert was capable of replicating in either ^ 
coli (with selection for chloramphenicol resistance) or 
in s. lividans (with selection for thiostrepton 
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resistance) . The ligation mixture was transformed to ^ 
coli. Plasmid DNA was isolated from several of the 
chloramphenicol resistant transf ormants and analyzed by 
agarose gel electrophoresis to ensure that the proper 
5 plasmid construct was obtained. This isolated plasmid 
material from E. coli was then transformed into S^. 
3 ividans as described by Hopwood and transf ormants were 
selected by plating onto R2YE medium containing 
thiostrepton at a concentration of 50 jig/ml. 
L0 Thiostrepton resistant S. 1 Widans transf ormants 

carrying the bifunctional plasmid with the 11.6 kb insert 
were patched onto MYM agar plates and allowed to incubate 
for 48h at 28" C before they were over layered with molten 
soft nutrient agar containing penicillin G at a 
15 concentration of 1 ng/ml and inoculated with 

Sfea t&ylococcus aureus N-2 as indicator organism (Jensen, 
1982). ( S. aureus N-2 was obtained from the Department of 
Microbiology Culture Collection, University of Alberta. 
Any organism which produces a 0-lactamase sensitive to 
clavulanic acid may be used as indicator organism.) 
Zones of inhibition which appeared around the S. 1 ividans 
colonies upon incubation overnight at 30 «C were evidence 
of clavulanic acid productiuon. Clavulanic acid- 
producing colonies were found amongst these, initial S^. 
25 1 ividans transf ormants at a frequency of about 12%. When 
plasmid DNA was isolated from one of these clavulanic 
acid-producing transf ormants and re-introduced into S^. 
1 ividans . the frequency of clavulanic acid production in 
these 2nd round transf ormants was about 40-45%. Figure 6 
30 shows a photograph of an agar plate bearing 2nd. round 
transf ormants. Zones of inhibition are seen as clear 
areas in the agar; these appear on the photograph as dark 
circular areas. 



20 



2108113 



17 

Example 5 

Sequencing of 15 kb D NA fragment 

Ordered sets of deletions were generated as 
described in Example 1 using fragments of the DNA insert 
from the cosmid clone K6L2 (Figure 9) and subcloned into 
the E. coli plasmids pUC118 andpUC119. Overlapping 
fragments were chosen which extended from the end of the 
pcb C gene downstream for a distance of about 15 kb ending 
at the Baill site. The deletion generated fragments were 
sequenced in both orientations as described in Example 1. 
The sequence is shown in Figure 2 . 

The present invention is not limited to the features 
of the embodiments described herein, but includes all 
variations and modifications within the scope of the 
claims. 
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TABLE 1 





(bp) 


cuu ioc ail on 

(bp) 


Length 
(bp) 


OIZC OI \Jt\JT 

(aa residues) 


1 * 

1 




I I Or 
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CO 

JJX 
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*• 


2216 




1799 
1 / 


S74 
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3940 


5481 


1542 


514 


4 


5654 


6595 


942 


314 


5 


6611 


7588 


978 


326 


6 


7895 


9076 


1182 


394 


7 


9241 


10 908 


1668 


556 


8* 


10 998 


12 296 


1299 


433 


9* 


12 622 


13 365 


744 


248 


10 


13 769 


14 995 


1227 


409 



Asterisks denote ORFs which are oriented in the opposite direction. 
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The embodiments of the invention in which an 
exclusive property or privilege is claimed are defined as 
follows: 

5 1. An isolated genomic DNA molecule comprising the 

nucleotide sequence of Figure 2 (Sequence ID No.:l). 

2. An isolated DNA molecule having the nucleotide 
sequence of nucleotides 2 033 to 13636 of Figure 2 

10 (Sequence ID No.:20). 

3 . An isolated DNA molecule having the nucleotide 
sequence of nucleotides 109 to 17 64 of Figure 2 (Sequence 
ID No. : 21) . 

15 

4. An isolated DNA molecule having the nucleotide 
sequence of nucleotides 2216 to 3937 of Figure 2 
(Sequence ID No.:22). 

20 5. An isolated DNA molecule having the nucleotide 

sequence of nucleotides 3940 to 5481 of Figure 2 
(Sequence ID No.:23). 

6. An isolated DNA molecule having the nucleotide 
25 sequence of nucleotides 5654 to 6595 of Figure 2 

(Sequence ID No.: 24). 

7. An isolated DNA molecule having the nucleotide 
sequence of nucleotides 6611 to 7588 of Figure 2 

30 (Sequence ID No.:25). 

8. An isolated DNA molecule having the nucleotide 
sequence of nucleotides 7 89 5 to 907 6 of Figure 2 
(Sequence ID No.:26). 

35 
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9. An isolated DNA molecule having the nucleotide 
sequence of nucleotides 9241 to 10908 of Figure 2 
(Sequence ID No.:27). 

10 . An isolated DNA molecule having the nucleotide 
sequence of nucleotides 10998 to 12296 of Figure 2 
(Sequence ID No. :28) . 

11. An isolated DNA molecule having the nucleotide 
sequence of nucleotides 12622 to 13365 of Figure 2 
(Sequence ID No. :29) . 

12 . An isolated DNA molecule having the nucleotide 
sequence of nucleotides 13769 to 14995 of Figure 2 
(Sequence ID No.:30). 

13 9 An isolated DNA molecule comprising a 
nucleotide sequence encoding the amino acid sequence of 
Figure 10. 

14 # An isolated DNA molecule comprising a 
nucleotide sequence encoding the amino acid sequence of 
Figure 11. 

15. An isolated DNA molecule comprising a 
nucleotide sequence encoding the amino acid sequence of 
Figure 12. 

16. An isolated DNA molecule comprising a 

0 nucleotide sequence encoding the amino acid sequence of 
Figure 13. 

17. An isolated DNA molecule comprising a 
nucleotide sequence encoding the amino acid sequence of 

5 Figure 14 . 
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18. An isolated DNA molecule comprising a 

nucleotide sequence encoding the amino acid sequence of 
Figure 15. 

5 19. An isolated DNA molecule comprising a 

nucleotide sequence encoding the amino acid sequence of 
Figure 16. 

20. An isolated DNA molecule comprising a 

10 nucleotide sequence encoding the amino acid sequence of 
Figure 17 . 

21. An isolated DNA molecule comprising a 
nucleotide sequence encoding the amino acid sequence of 

15 Figure 18. 

22. An isolated DNA molecule comprising a 
nucleotide sequence encoding the amino acid sequence of 
Figure 19. 

20 

23. An isolated protein having the amino acid 
sequence of Figure 10. 

24. An isolated protein having the amino acid 
25 sequence of Figure 11. 

25. An isolated protein having the amino acid 
sequence of Figure 12 . 

30 26. An isolated protein having the amino acid 

sequence of Figure 13. 

27. An isolated protein having the amino acid 

sequence of Figure 14. 
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28. An isolated protein having the amino acid 

sequence of Figure 15. 
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29. An isolated protein having the amino acid 
sequence of Figure 16. 

30. An isolated protein having the amino acid 
5 sequence of Figure 17. 

31. An isolated protein having the amino acid 
sequence of Figure 18. 

0 32. An isolated protein having the amino acid 

sequence of Figure 19. 

33, A recombinant vector comprising a DNA molecule 
in accordance with any of claims 1 to 22. 

5 

34. A host transformed with a recombinant vector 
comprising a DNA molecule in accordance with any of 
claims 1 to 22. 

0 35. A host transformed with a recombinant vector in 

accordance with claim 2 wherein the host is a 
Streptomvcete . 

36. A host in accordance with claim 35 which is 
25 lividans. 

37. A process for producing clavulanic acid in a 
non-clavulanate-producing host comprising transforming 
the host with a DNA molecule in accordance with claim 2 

3 0 and culturing the host under suitable conditions to 
produce clavulanic acid. 

38. A process for producing clavulanic acid in 
accordance with claim 37 wherein the host is S. lividans. 

35 

39. A process for enhancing clavulanic acid 
production in a clavulanate-producing host comprising 



_2109113A1_L? r ." . 



* * v o 1 1 a 



23 

transforming the host with a DNA molecule comprising a 
nucleotide sequence encoding one or more of the enzymes 
of the clavulanate synthetic pathway. 
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FIGURE 2-1 

I 10 I 20 I 30 I 40 I 50 I 60 

1 gcggaaccgg ccgcccctga gcggggcggc cgggaaggaa acgggccggt cgtcccctcg 60 

*--End of ORF 1 

61 ggagggggcg gccggcccgt ccggtgcgcg cggtgggtgc ggcgcgggTC AGCCGGCCGC 120 
121 GAGGTTGCTG AGGAACTTCG CGGCGACGGG GCCCGCGTCG GCGCCGCCCG ACCCGCCGTC 180 
181 CTCCAGCAGG ACCGACCAGG CGATGTTCCG GTCGCCCTGG TAGCCGATCA TCCAGGCGTG 240 
211 CGTCTTCGGC GGCTTCTCGG TGCCGAACTC GGCGGTACCG GTCTTGGCGT GCGGCTGTCC 300 
301 GCCGAGGCCC CGCAGGGCGT CGCCGGCGCC GTCGGTGACG GTCGAACGCA TCATGGAACG 360 
361 CAGCGAGTCG ACGATGCCCG GGGCCATCCG GGGGGCCTGG TGC6GCTTCT TGACCGCGTC 420 
421 GGGCACCAGC ACGGGCTGCT TGAACTCGCC CTGCTTGACG GTGGCGGCGA TGGAGGCCAT 480 
481 CACCAGGGGC GACGCCTCGA CCCTGGCCTG TCCGATGGTG GACGCGGCCT TGTCGTTCTC 640 
541 GCT6TTG6A6 ACGGGGACGC TGCCGTCGAA GGTGGAGGCG CCGACGTCCC AGGTGCCGCC 600 
601 GATGCCGAAG GCTTCGGCGG CCTGCTTCAG GCTGGACTCG GAGAGCTTGC TGCGGGAGTT 660 
661 GACGAAGAAC GTGTTGCAGG AGTGGGCGAA GCTGTCCCGG AAGGTCGAGC CCGCGGGCAG 720 
721 CGTGAACTGG TCCTGGTTCT CGAAGCTCTG GCCGTTGACA TGGGCGAACT TCGGGCAGTC 780 
781 GGCCCGCTCC TCCGGGTTCA TCCCCTGCTG GAGCAGGGCC GCGGTGGTGA CCACCTTGAA 840 
841 GGTGGAGCCG GGCGGGTAGC GGCCCTCCAG CGCGCGGTTC ATGCCGGAGG GCACGTTCGC 900 
901 GGCGGCCAGG ATGTTGCCGG TGGCGGGGTC GACGGCGACG ATCGCCGCGT TCTTCTTCGA 960 
961 GCCCTCCAGG GCCGCCGCGG CGGCGGACTG GACCCGCGGG TCGATGGTGG TCTTCACCGG 1020 
1021 CTTGCCCTCG GTGTCCTTGA GGCCGGTGAG CTTCTTGACC ACCTGGCCGG ACTCACGGTC 1080 
1081 CAGGATCACG ACCGAGCGCG CCGCGCCGGA GCCGCCGGTG AGCTGCTTGT CGTAGCGGGA 1140 
1141 CTGGAGGCCC GCCGAGCCCT TGCCGGTCCT GGGGTCGACC GCGCCGATGA TGGAGGCGGC 1200 
1201 CTGGAGGACA TTGCCGTTGG CGTCGAGGAT GTCCGCGCGC TCCCGCGACT TGAGGGCGAG 1260 
1261 GGTCTGCCCC GGAACCATCT GCGGATGGAT CATCTCGGTG TTGAACGCGA CCTTCCACTC 1320 
1321 CTTGCCGCCG CCGACGA.CCT TCGCGGTGGA GTCCCAGGCG TACTCCCCGG CCCCGGGGAG 1380 
1381 GGTCATTCTG ACGGTGAACG GTATCTCCAC CTCGCCCTCG GGGTTCTTCT CCCCGGTCTT 1440 
1441 GGCGGTGATC TCCGTCTTCG TCGGCTTGAG GTTGGTCATG ACGGATTTGA TCAGCGACTC 1500 
1501 GGCGTTGTCC GGGGTGTCCG TCAGCCCGGC GGCCGTCGGG GCGTCGCCCT TCTCCCAGGC 1560 
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FIGURE 

1561 GCCGAGGAAG GTGTCGAACT GTCCGGCCGC 
1621 CTCGTCGGCA ACCAGGCTGG TGTAACCCCA 

1681 GGCGACCACC GCGGTGGCCG CCCGGCCACG 

<--Beginning of ORF 1 
1741 GCCATAGTTG TCGGAATGCG TCATggggcc 

1801 cccggatacc gcgtttcagg acagtcaagg 

1861 gcggcccgtt cccacccctt ggggggaagc 
1921 atggaaaggg gagcgaatcg gtcgccgogt 
1981 gtgacagcgg ggagtagcga caaaacggtc 

2041 gtcatcgggt tcggcgacgg atgggcggtt 

2101 ttcacaagaa ctcccgatac gtggagaaga 

2161 ccgagaaccg tccaccatga cggagcctgg 

ning of ORF 2-- > 
2221 CCGTGTATCG ACCGCCCCCA GCGGCAAGCC 

2281 GCGTGATCAC GGTGTGGGGA AGGTGTTTGG 

2341 CTTCGACGAG GTCGACCCCA TCGACTTCGT 

2401 CGCCGCTGAT GTCCTCGCGC GGATCACCGG 

2461 CCCCGGTATG ACCAACCTCT CCACCGGTAT 

2621 CATCGCGCTC GCCGCGCAGT CGGAGTCGCA 

2581 CCTGGACTCG GTGGCGATCG TCGCCCCGAT 

2641 CCACGAGATC ACCGACCTCG TCGACTCCGC 

2701 GCCCTCCTTC ATCTCCCTCC CGGTGGACCT 

2761 CGTCCCCAAC CCGCCGGCGA ACACCCCGGC 

2821 GCAGAAGGCC GCCGACCAGG CCGCCGCCCT 

2881 CGTCGGAGCG GC.CGCGATCC GCTCGGGCGC 

2941 CCTGAACATC CCGGTCATCA CGACCTACAT 

3001 GCTGAACTAC GGCGCCGTCA CCGGCTACAT 

3061 GACCATGTTC GCCCCGGTGG ACCTCG7CCT 

3121 GCGCCCGTCC ATGTGGCAGA AGGGCATCGA 

3181 CAACCCGATC CCCCGGGTCT ACCGGCCCGA 



2-2 

CGCCTCCACC TCGGGGTCGC CCGAATCCTT 1620 

ATAGCCGAGC CCCACCGTCA CGGCCAGCCC 1680 
GGAGCGGCGC CTGCCCTGCG GCGGGTCATC 1740 
aggctatgcg ggcgccctct ttccctcctc 1800 

ggccgaacgg agggctggac cagccgctca 1860 

ggcacccggo aggtgaccga ggcaacatcc 1920 
tcaccgcga-t tggagtagac ctctgaaagc 1980 
agacccctga agggaattga ctgaattcga 2040 

cggccacgca ccgtcactct tcgtcccctc 2100 

gagcgtgaag agcgcgtccg gtcagggttg 2160 

Begin- 

tactgacgga gtcgggagac cgctcATGTC 2220 
TACCGCCGCT CACGCCCTCC TGTCACGGTT 2280 

GGTTGTCGGC CGAGAGGCCG CGTCGATTCT 2340 

TCTGACCCGC CACGAGTTCA CCGCGGGTGT 2400 

TCGCCCCCAG GCGTGCTGGG CCACCCTGGG 2460 

CGCCACGTCC GTCCTGGACC GCTCGCCGGT 2520 

CGACATCTTC CCGAACGACA CCCACCAGTG 2580 

GTCCTTGTAC GCCGTGGAGC TCCAGCGGCC 2640 

CGTGAACGCG GCCATGACCG AGCCGGTCGG 2700 

GCTCGGCTCC TCCGAGGGCA TCGACACCAC 2760 

GAAACCGGTC GGCGTCGTCG CCGACGGCTG 2820 

GCTCGCCGAG GCCAAGCACC CGGTGCTCGT 2880 

CGTCCCGGCG ATCCGCGCCC TGGCCGAGCG 2940 

CGCCAAGGGT GTCCTGCCGG TCGGCCACGA 3000 

GGACGGCATC CTCAACTTCC CGGCGCTCCA 3060 

CACCGTCGGC TACGACTACG CCGAGGACCT 3120 

GAAGAAGACC GTCCGTATCT CCCCGACGGT 3180 

CGTCGACGTC GTCACCGACG TCCTCGCCTT 3240 
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FIGURE 2-3 

3241 CGTGGAGCAC TTCGAGACCG CGACCGCCTC CTTCGGGGCC AAGCAGCGCC ACGACATCGA 3300 

3301 GCCGCTGCGC GCCCGGATCG CGGAGTTCCT GGCC6ACCCG GAGACCTACG AGGACGGCAT 3360 

3361 GCGCGTCCAC CAGGTCATCG ACTCCATGAA CACCGTCATG GAGGAGGCCG CCGAGCCCGG 3420 

3421 CGAGGGCACG ATCGTCTCCG ACATCGGCTT CTTCCGTCAC TACGGTGTGC TCTTCGCCCG 3480 

3481 CGCCGACCAG CCCTTCGGCT TCCTCACCTC GGCGGGCTGC TCCAGCTTCG GCTACGGCAT 3540 

3541 CCCCGCCGCC ATCGGCGCCC AGATGGCCCG CCCGGACCAG CCGACCTTCC TCATCGCGGG 3600 

3601 TGACGGCGGC TTCCACTCCA ACAGCTCCGA CCTGGAGACC ATCGCCCGGC TCAACCTGCC 3660 

3661 GATCGTGACC GTCGTCGTCA ACAACGACAC CAACGGCCTG ATCGAGCTGT ACCAGAACAT 3720 

3721 CGGTCACCAC CGCAGCCACG ACCCGGCGGT CAAGTTCGGC GGCGTCGACT TCGTCGCGCT 3780 

3781 CGCCGAGGCC AACGGTGTCG ACGCCACCCG CGCCACCAAC CGCGAGGAGC TGCTCGCGGC 3840 

3841 CCTGCGCAAG GGTGCCGAGC TGGGTCGTCC GTTCCTCATC GAGGTCCCGG TCAACTACGA 3900 

End of ORF 2— > Beginning of ORF 3--> 

3901 CTTCCAGCCG GGCGGCTTCG GCGCCCTGAG CATCTGAtcA TGGGGGCACC GGTTCTTCCG 3960 

3961 GCTGCCTTCG GGTTCCTGGC CTCCGCCCGA ACGGGCGGGG GCCGGGCCCC CGGCCCGGTC 4020 

4021 TTCGCGACCC GGGGCAGCCA CACCGACATC GACACGCCCC AGGGGGAGCG CTCGCTCGCG 4080 

4081 GCGACCCTGG TGCACGCCCC CTCGGTCGCG CCCGACCGCG CGGTGGCGCG CTCCCTCACC 4140 

4141 GGCGCGCCCA CCACCGCGGT GCTCGCCGGT GAGATCTACA ACCGGGACGA ACTCCTCTCC 4200 

4201 GTGCTGCCCG CCGGACCCGC GCCGGAGGGG GACGCGGAGC TGGTCCTGCG GCTGCTGGAA 4260 

4261 CGCTATGACC TGCATGCCTT CCGGCTGGTG AACGGGCGCT TCGCGACCGT GGTGCGGACC 4320 

4321 GGGGACCGGG TCCTGCTCGC CACCGACCAC GCCGGTTCGG TGCCGCTGTA CACCTGTGTG 4380 

4381 GCGCCGGGCG AGGTCCGGGC GTCCACCGAG GCCAAGGCGC TCGCCGCGCA CCGCGACCCG 4440 

4441 AAGGGCTTCC CGCTCGCGGA CGCCCGCCGG GTCGCCGGTC TGACCGGTGT CTACCAGGTG 4500 

4501 CCCGCGGGCG CCGTGATGGA CATCGACCTC GGCTCGGGCA CCGCCGTCAC CCACCGCACC 4560 

4561 TGGACCCCGG GCCTCTCCCG CCGCATCCTG CCGGAGGGCG AGGCCGTCGC GGCCGTGCGG 4620 

4621 GCCGCGCTGG AGAAGGCCGT CGCCCAGCGG GTCACCCCCG GCGACACCCC GTTGGTGGTG 4680 

4681 CTCTCCGGCG GAATCGACTC CTCCGGGGTC GCGGCCTGTG CGCACCGGGC GGCCGGGGAA 4740 

4741 CTGGACACGG TGTCCATGGG CACCGACACG TCCAACGAGT TCCGCGAGGC CCGGGCGGTC 4800 

4801 GTCGACCATC TGCGCACCCG GCACCGGGAG ATCACCATCC CGACCACCGA GCTGCTGGCG 4860 
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FIGURE 2-1 

4861 CAGCTCCCGT ACGCGGTGTG GGCCTCCGAG TCGGTGGACC CGGACATCAT CGAGTACCTG 4920 

4921 CTCCCCCTGA CAGCGCTCTA CCGGGCGCTC GACGGGCCGG AGCGCCGCAT CCTCACCGGG 4980 

4981 TACGGCGCGG ACATCCCCCT CGGGGGCATG CACCGCGAGG ACCGGCTGCC CGCGCTGGAC 5040 

5011 ACCGTTCTCG CGCACGACAT GGCCACCTTC. GACGGGCTGA ACGAGATGTC CCCGGTGCTG 5100 

5101 TCCACGCTGG CGGGGCACTG GACCACCCAC CCGTACTGGG ACCGGGAGGT CCTCGATCTG 5160 

5161 CTGGTCTCGC TGGAGGCCGG GCTCAAGCGG CGGCACGGCC GGGACAAGTG GGTGCTGCGC 5220 

5221 GCCGCGATGG CCGACGCCCT CCCGGCGGAG ACCGTCAACC GGCCCAAGCT GGGCGTCCAC 5280 

5281 GAGGGCTCGG GCACCACGTC CTCGTTCTCC CGGCTGCTGC TGGACCACGG TGTCGCCGAG 5340 

5341 GACCGCGTCC ACGAGGCGAA GCGGCAGGTG GTGCGCGAGC TGTTCGATCT CACG6TCGGG 5400 

5401 GGCGGACGGC ACCCCTCCGA GGTGGACACC GACGATGTGG TGCGCTCCGT GGCCGACCGG 5460 
End of ORF 3--' ^on 
5461 ACCGCGCGGG GGGCGGCCTA Gtcccgccac ggggagcccg ccggacgccg gacccgcgcg oo^ 

5521 ggacccgtac ccggggccgc ccgcggactc cggcgcaccg gcacccctgt cccccacccg 5580 
5581 ttgacgaccg tcggccctcg gccctcgcgg cccctgacga ccgtcgcccg attcccagga 5640 
5611 gggagctgaa agcGTGGAGC n GCATCGACTC "gCACGTTTCA CCCCGCTACG CACAGATCCC 5700 
5701 CACCTTCATG CGCCTGCCGC ACGATCCCCA GCCCCGCGGC TATGACGTGG TGGTCATCGG 5760 
5761 AGCCCCCTAC GACGGGGGCA CCAGCTACCG TCCCGGCGCC CGGTTCGGCC CCCAGGCCAT 5820 
5821 CCGCAGTGAG TCGGGCCTCA TCCACGGTGT CGGCATCGAC CGGGGCCCCG GCACGTTCGA 5880 
5861 CCTGATCAAC TGTGTCGACG CCGGGGACAT CAATCTGACG CCGTTCGACA TGAACATCGC 5940 
5911 GATCGACACG GCGCAGAGCC ATCTGTCGGG CCTGCTGAAG GCCAACGCCG CCTTTCTGAT 6000 
6001 GATCGGCGGC GACCACTCGC TGACGG7GGC CGCCCTGCGC GCGGTCGCGG AGCAGCACGG 6060 
6061 CCCGCTCGCC GTGG7GCACC TGGACGCGCA CTCCGACACC AACCCGGCCT TCTACGGGGG 6120 
6121 CCGGTACCAC CACGGCACCC CCTTCCGGCA CGGGATCGAC GAGAAGCTGA TCGACCCGGC 6180 
6181 GGCGATGGTC CAGATCGGCA TCCGGGGCCA CAACCCGAAG CCGGACTCGC TCGACTACGC 6240 
6211 CCGGGGCCAC GGCGTCCGGG TGGTCACGGC GGACGAGTTC GGCGAGCTGG GGGTGGGCGG 6300 
6301 GACCGCCGAC CTCATCCGCG AGAAGGTCGG CCAGCGGCCC GTGTACGTCT CGGTCGACAT 6360 
6361 CGACGTGGTC GACCCCGCCT TCGCCCCCGG TACGGGCACG CCCGCGCCGG GCGGGCTCCT 6420 
6121 CTCGCGCGAG GTGCTGGCGC TGCTGCGCTG CGTGGGTGAC CTGAAGCCGG TCGGCTTCGA 6480 
6481 CGTGATGGAG GTGTCACCCC TCTACGACCA CGGCGGGATC ACTTCGATCC TGGCCACGGA 6540 
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FIGURE 2-5 

End of ORF 4--> 

6541 GATCGGTGCG GAACTGCTCT ACCAGTACGC CCGAGCCCAC AGAACCCAGT TGTGAaggag 6600 

Beginning of ORF 5--> ^ 

6601 acatcgtgtc ATGGCCTCTC CGATAGTTGA CTGCACCCCG TACCGCGACG AGCTGCTCGC 6660 

6661 GCTCGCCTCC GAGCTTCCCG AGGTGCCGCG CGCGGACCTC CATGGCTTCC TCGACGAGGC 6720 

6721 GAAGACGCTG GCCGCCCGTC TCCCGGAGGG GCTGGCCGCC GCTCTCGACA CCTTCAACGC 6780 

6781 CGTGGGCAGC GAGGACGGTT ATCTGCTGCT GCGCGGGCTG CCCGTCGACG ACAGCGAGCT 6840 

6841 GCCCGAGACG CCGACCTCCA CCCCGGCCCC GCTGGACCGC AAGCGGCTGG TGATGGAGGC 6900 

6901 CATGCTCGCG CTGGCCGGCC GCCGGCTCGG TCTGCACACG GGGTACCAGG AGCTGCGCTC 6960 

6961 GGGCACGGTC TACCACGACG TGTACCCGTC GCCCGGCGCG CACTACCTGT CCTCGGAGAC 7020 

7021 CTCCGAGACG CTGCTGGAGT TCCACACGGA GATGGCGTAC CACATCCTCC AGCCGAACTA 7080 

7081 CGTCATGCTG GCCTGCTCCC GCGCGGACCA CGAGAACCGG GCGGAGACGC TGGTCGGCTC 7140 

7141 GGTCCGCAAG GCGCTGCCCC TGCTGGACGA GAAGACCCGG GCCCGTCTCT TCGACCGCAA 7200 

7201 GGTGCCCTGC TGCGTGGACG TGGCCTTCCG CGGCGGGGTC GACGACCCGG GCGCGATCGC 7260 

7261 CAACGTCAAG CCGCTCTACG GGGACGCGAA CGACCCGTTC CTCGGGTACG ACCGCGAGCT 7320 

7321 GCTGGCGCCG GAGGACCCCG CGGACAAGGA GGCCGTCGCC CATCTGTCCC AGGCGCTCGA 7380 

7381 CGATGTGACC GTCGGGGTGA AGCTCGTCCC CGGTGACGTC CTCATCATCG ACAACTTCCG 7440 

7441 CACCACGCAC GCGCGGACGC CGTTCTCGCC CCGCTGGGAC GGGAAGGACC GCTGGCTGCA 7600 

7601 CCGCGTCTAC ATCCGCACCG ACCGCAATGG ACAGCTCTCC GGCGGCGAGC GCGCGGGCGA 7660 

End of ORF 5--> 

7561 CACCATCTCG TTCTCGCCGC GCCGCTGAgc ccggctcccc gaggccctgg gccccggcgc /SZU 

7621 cggaaccggc tcccggtcct gccccctcac ccgccgcgcg ggtgaggggg caggcccctt 7680 

7681 tgtgccgggt gccgtgcgtc ctgcgagggt gccggggcgg gggggacggc ggaggtgccc 7740 

7741 ggcggccggg tgccgtgcgc cgcccgtggg tgctgtacag cactccgtgt gccgtgcgcc 7800 

7801 accccgtgco taacrtttgcc actctatggg aaataatgca gagtgcgacg ggtgaggccg 7860 

Beginning of ORF ©--> 
7861 tcgccgtgcc ctttccgtga caggagacgc tgacATGTCC GACAGCACAC CGAAGACGCC 7920 

7921 CCGGGGATTC GTGGTGCACA CGGCGCCGGT GGGCCTGGCC GACGACGGCC GCCACGACTT 7980 

7981 CACC6TCCTC GCCTCCACCG CCCCGGCCAC CGTGAGCGCC GTCTTCACCC GCTCCCGCTT 8040 

8041 CGCCGGGCCG AGCGTCGTGC TGTGCCGGGA GGCGGTGGCC GACGGGCAGG CGCGCGGTGT 8100 

8101 GGTGGTGCTG GCCCGCAACG CGAATGTCGC GACCGGCCTG GAGGGCGAGG AGAACGCGCG 8160 



2_ / a/. 



2108113 



FIGURE 2-6 

8161 CGA6GTGCGC GAGGCCGTCG CCCGGGCCCT CGGGCTGCCG GAGGGCGAGA TGCTGATCGC 8220 
8221 CTCCACCGGG GTGATCGGCC GGCAGTACCC GATGGAGAGC ATCCGGGAGC ACCTCAAGAC 8280 
8281 GCTGGAGTGG CCCGCCGGGG AGGGCGGCTT CGACCGCGCG GCCCGCGCCA TCATGACGAC 8310 
8341 CGACACCCGG CCCAAGGAGG TCCGGG7CAG CGTCGGCGGG GCGACCCTCG TGGGCATCGC 8400 
8101 CAAGGGCGTC GGCATGCTGG AGCCCGACAT GGCGACGCTG CTGACCTTCT TCGCCACGGA 8460 
8461 CGCCCGGCTG GACCCGGCCG AGCAGGACCG CCTCTTCCGC CGGGTCATGG ACCGCACCTT 8520 
8521 CAACGCGGTC AGCATCGACA CCGACACCTC CACCAGCGAC ACGGCGGTGC TGTTCGCCAA 8580 
8581 CGGCCTGGCG GGCGAGGTCG ACGCCGGGGA GTTCGAGGAG GCGCTGCACA CGGCGGCGCT 8640 
8641 GGCCCTGGTC AAGGACATCG CGAGCGACGG CGAGGGCGCG GCCAAGCTGA TCGAGGTCCA 8700 
8701 GGTCACCGGC GCCCGCGACG ACGCCCAGGC CAAGCGGGTC GGCAAGACCG TCGTCAACTC 8760 
8761 CCCGTTGGTG AAGACCGCCG TGCACGGCTG CGACCCCAAC TGGGGCCGGG TCGCCATGGC 8820 
8821 GATCQGCAAG TGCTCGGACG ACACCGACAT CGACCAGGAG CGGGTGACGA TCCGCTTCGG 8880 
8881 CGAGGTCGAG GTCTATCCGC CGAAGGCCCG GGGCGACCAG GCCGACGACG CGCTGCGGGC 8940 
8941 CGCCGTCGCG GAGCATCTGC GGGGCGACGA GGTGGTCATC GGGATCGACC TCGCCATCGC 9000 
9001 GGACGGGGCC TTCACCGTCT ACGGCTGCGA CCTCACCGAG GGCTATGTCC GGCTGAACTC 9060 
9061 GGAGTACACC^ACCTGAtccc cggacoggga acgggccgcc gccccgitcc ctgtccgcic 9120 
9121 ccgtcccgtg tggttotacc gaccgttccc cggctatgcg cacgggocgg agcggccccc 9180 
9181 gccgggcccc gcccggccgc acgatgaggg gcgatgcaag gtgacgaggg caggagggac 9240 
9241 At!gAGACCA°CTCGGTCGAC GACCGCGGAC GAGGGCTTCG ACGCCGGGGT ACGGGGAGTG 9300 
9301 GTCGCGCCGA CCGACGCCCC GGGCGGGACG CTGCGGCTGG TCCGCACGGA CGACTTCGAC 9360 
9361 TCGCTCGACC CCGGCAACAC GTACTACGCC TACACCTGGA ACTTCCTCCG GCTCATCGGC 9420 
9421 CGGACGCTGG TCACCTTCGA CACCGCGCCG GGCAAGGCGG GCCAGCGGCT CGTGCCCGAC 9480 
9481 CTCGCCGAGT CGCTGGGC6A GTCCTCCGAG GACGGCCGGG TCTGGACCTA CCGGCTGCGC 9540 
9541 GAGGGCCTGC GCTACGAGGA CGGCACGCCG GTCGTCTCGG CCGACATCAA GCACGCCATC 9600 
9601 GCCCGCAGCA ACTACGGCAC CGATGTCCTG GGCGCCGGTC CGACCTACTT CCGCCACCTC 9660 
9661 CTGGGCACCG AGTACGGCGG CCCCTGGCG6 GAGCCGGACG CCGACGGACC GGTGACGCTG 9720 
9721 GAGACCCCGG ACGAGCGGAC GCTGGTCTTC CGGCTGCGGG AGCCGTTCGC GGGGATGGAT 9780 
9781 CTGCTGGCGA CCATGCCGTC CACCACCCCC GTGCCGCGCG ACCGGGACAC CGGCGCCGAG 9840 



2108113 



FIGURE 2-7 

9841 TACCGGCTGC GGCCCGTGGC GACCGGCCCG TACCGGATCG TCTCGTACAC CCGGGGCGAG 9900 
9901 CTGGCCGTCC TGGAGCCCAA TCCGCACTGG GACCCCGAGA CCGACCCGGT GCGCGTCCAG 9960 
9961 CGCGCCTCCC GGATCGAGGT GCACCTCGGC AAGGACCCGC ACGAGGTGGA CCGCATGCTG 10020 
10021 CTGGCGGGCG AGGCCCATGT GGACCTCGCG GGCTTCGGTG TGCAGCCCGC GGCCCAGGAG 10080 
10081 CGCATCCTCG CCGAGCCGGA GCTGCGCGCG CACGCGGACA ACCCGCTGAC CGGCTTCACC 10140 
10141 TGGATCTACT GCCTGTCGAG CCGGATCGCC CCGTTCGACA ATGTGCACTG CCGGCGGGCC 10200 
10201 GTGCAGTTCG CCACCGACAA AGCGGCCATG CAGGAGGCGT ACGGCGGCGC GGTGGGCGGC 10260 
10261 GACATCGCGA CCACCCTGCT GCCCCCGACC CTCGACGGCT ACAAGCACTT CGACCGCTAC 10320 
10321 CCGGTCGGCC CCGAGGGCAC CGGCGACCTG GAGGCCGCCC GCGCCGAGCT GAAGCTGGCC 10380 
10381 GGGATGCCCG ACGGCTTCCG CACCAGGATC GCCGCCCGCA AGGACCGGCT CAAGGAGTAC 10440 
10441 CGGGCCGCCG AGGCGCTGGC CGCCGGGCTC GCCCGGGTCG GCATCGAGGC GGAGGTGCTG 10500 
10501 GACTTCCCGT CGGGCGACTA CTTCGACCGC TACGGCGGCT GCCCGGAGTA TCTGCGCGAG 10560 
10561 CACGGGATCG GGATCATCAT GTTCGGCTGG GGCGCCGACT TCCCCGACGG ATACGGCTTC 10620 
10621 CTCCAGCAGA TCACCGACGG GCGCGCGATC AAGGAGCGCG GCAACCAGAA CATGGGCGAG 10680 
10681 CTGGACGACC CGGAGATCAA CGCGCTGCTG GACGAGGGGG CGCAGTGCGC CGACCCGGCG 10740 
10741 CGGCGCGCGG AGATCTGGCA CCGCATCGAC CAGCTCACGA TGGACCACGC GGTCATCGTT 10800 

10801 CCGTATCTGT ACCCGCGGTC CCTGCTCTAC CGGCACCC6G ACACCCGCAA CGCCTTCGTC 10860 

End of ORF /"> loaon 
10861 ACCGGCTCCT TCGGGATGTA CGACTACGTG GCGCTCGGCG CGAAGTGAgc acggggtccg 10«W 

10921 gccccgggac cgtatgtccc ggggccggac cccgcccgtt ccccgcccgg tccggtccgg 10980 

10981 acccggtcgc ggcccgcTCA^CMACATC 8 CGGGCCCCGG CCGCGACCCC GCGCCGGATC 11040 

11041 GGCCAGTGGC CCTGCGCCAG 6GGCCGTTCC ACGCT6CGGC AGGCGAGAGC GGCCTCGCGG 11100 

11101 AACTCCGCCT CGTACAGCGC GAGCTGGCGC AGGAACTGCC GGGTCGGGCC GGTCAGGCTG 11160 

11161 GTCCCCCGCG GGCTGCGCAG CAGCAGCC6G GCGCC6AGGG ACTGCTCCAG CCGGTGAATC 11220 

11221 CGGCGGGTGA GCGCCGACTG GCTGATCGAC AGCACCGCCG CGGCCCGGTT GATGCTGCCG 11280 

11281 T6CCGGGCCA CGGCCTGGAG CAGATGGAGA TCGTCCACAT CCAGTTTGCG GCCCTCGGCC 11340 

11341 TGGCCGGGCA CGGAGCCCTG GTCGGGTCCC GCCCCGAAGC GGCGGGCGTC CGCGCCGGTG 11400 

11401 CGCTCCGCGT ACCACTGCGC CCACCAGGGC TCGTCCAGCA GGTCGCGGTG GTGTTCGGCG 11460 

11461 AAGCGCCGGA GCTGGACCTC GGCGATCAGC GCGGCCAGCC GTCCCGCCAG CGCCCGGGGC 11520 

/ a/. £~~~£ 
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FIGURE 2-8 

11621 ACGATGGTGG GGTCGACGAG CAGACTCGTG GTGCGGCGCG GGCGCTCCGC CAGGGAGCGG 11580 
,1581 CGCACCAGCG AGGGGTCCTG CACCGCCGGG TGGGTGGGCG AGCCGAGACC TATCGCGTCC 11640 
1,641 CCGCGGCGCA GGATGCCCCG GGCAACCGAT GCCCCCGTGA TGTGGAGCCG GGTGGGCGCG 11700 
,1701 GTGAGCCCGG CCAGCTGGAA GACACGTGTC ACCAGGATCT CCGAGCCGGG TCCCGTCTCG 11760 
,1761 GACACCCAGG TCTCGTCCCG CAGATCGGCG AGCGAGACCT CCCGCCGGGC GGCCAGCGGA 11820 
,1821 TGGTCCCGGG GCAGGATCAC CCACAGCGGG TCGTCCAGCA CCTCACAGGT GCGCACGGAC 11880 
,1881 CGCTCCAGGC TGTGCCGGGG GGACTGGAGG CTCCAGGTGT AGGCCGCGTC CACCTGGTAG 11940 
,,94, CCCGCCAGTT GGGCGGCGAC CTGGTGCGGG GCCTCGTGCC GGACCGACAG CAGCAGGTCC 12000 
..,2001 AGCGAGGCCG CCGCGTCCTC CACCACCTCG TCGAGCAGGG GTTCCGTGGA GACCAGCGAC 12060 
,2061 AGCACCTCCG GGGCGTCCAC GGCCTCGGAG CCATGGCCGA AGATATGCGT CCGCGCGGCC 12120 
12121 AGGTCGACCT GGTGGAAGAA CCGCCGCCCG GCGACGAGGA TGCGGGAGCC CGCGGTGGTC 12180 
12181 AGCCGGGCCG TGTGGCGGCT GCGCAGGGTC AGCGGGAGGC^CGACGATCCG ^GTCCAGCCGG 12240 
12241 TCGAGTCTGC GCTCCACGGT GCCGTGCCGG ACACCCGTCC GCCGGGCCAC TTCCATgagg 12300 
12301 tctccgcagt gtcccaccgc gtccagtaaa gacagotcgc atcggctgac accagcagac 12360 
,2361 gtcggttctg occcgagoga caatgtcggt tcccttttec gtcaaggact gtaccgctga 12420 
,2421 attgtccgaa gtggctcttg aattgcttcg gaatcgotcc taggcagcgc cgctcttcgg 12480 
12481 attctcctcg ccgggaagcg gaacgcgccc ggccggatgg cgggcgcgct ccgggcgccg 12540 
12541 tcccgggoac gggggacggg gcocggcacg 9ecggecacc cggtccgggc gcgcggcgtg 12600 
,260, gacctggtcg gcggacgggt gTCAGACCTG TtcSgTGGGG CGTATGAAGA TCTCGTGGAC 12660 
12661 GGTCGCGTGG TGCGGCGCGG TCACGGCGTA GCGGACCGCC TCCGCGATGT CCTGGGCCTG 12720 
,2721 GAGCTTGCGG ATCTGGCTGA TCCGCTGCTC GTACATCTCC TTGGTGGCGG TGTGGGTGAT ,2780 
,278, GTGGCCGCGC AGCTCCGTGT CGGTGGTGCC CGGCTCGATG ACGACGACCC GCACCCCGCG ,2840 
,284, CTCGGTGACC TCCTGGCGCA GCGTCTCGCT GAACGCGTTC ACACCGAACT TCGTGGCCTG ,2900 
,2901 GTAGACGGCC GCGTTGCGGA CGTTCACCCG GCCCGCGATC GAGGACATCT GCACCACGGT 12960 
,2961 GCCCTTGCTG CGCAGCAGAT GGGGAAGGGC CGCCCGGGTC ATGTACATCA GGCCCAGGAG 13020 
,3021 ATTGGTGTCG ATCATCCGGG TCCAGTCGGT GGTGTCGGCG TCCTCCACCG GGCCGAGCAG 13080 
,308, CATGATCCCG GCGTTGTTGA CGAGGATGTC GAGGCCGCCC AGCGCCTCGA CGGTGGAGGC 13,40 
,3,4, GACGGCGGCG TCCACCCCCT GCCGGTCGGC GACGTCGAGT TCGAGGACAT GGACCTTCGC 13200 
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FIGURE 2-9 

13201 CCCGGCGGCG GTCAGCTCGT CACCCAGGGC GCGCAGCTTC TCGACCCGGC GCGCGGCGAT 13260 

13261 GGCCACGGCG GCGCCCTCGG CGGCCAGGGC GCGGGCCGTG GCCTCGCCGA TGCCCGAGCT 13320 

< beginning of ORF 9 

13321 CGCGCCCGTG ATGAGCGCGA CTTTCCCCTG GAGTGCGGAT GGCATcattt cctccacatg 13380 

13381 gtgctgcgat cgtggtgagc gtatgaagaa ggggtgagac ctgccgtgcc ggggcgggtt 13440 

13441 ccgtacgccg gaccgttgcg gtgggcacgg ccgaccgggt acggatggcc gcagttcccc 13500 

13501 ggggagttcc cggggaatgg tgaataccgc ggcgctctcc gatggtcttc ggaggacacc 13560 

13561 cggggattca ccgggaatca gcggccggag ttctccccgt ccacggcaga cgctatcagc 13620 

* m m • • • 

13621 gtcgcattcc ccggtgaatt cccttcggtg gaccgggtta tgactgtttc cgccgggtta 13680 

13681 tgcgcgccgc cccggcggac cggccacccg cccgggggct gcggcagatt gggcgccacg 13740 

Beginning of ORF 10 > 

13741 acatggcgcg agcagcgatc ggcggtggAT GATGAACGAG GCAGCGCCTC AGTCCGACCA 13800 

13801 GGTGGCACCG GCGTATCCGA TGCACCGGGT CTGCCCGGTC GACCCGCCGC CGCAACTGGC 13860 

13861 CGGGCTGCGG TCCCAGAAGG CCGCGAGCCG GGTGACGCTG TGGGACGGCA GCCAGGTGTG 13920 

13921 GCTGGTGACC TCGCACGCCG GGGCCCGGGC CGTCCTGGGC GACCGCCGCT TCACCGCGGT 13980 

13981 GACGAGCGCG CCCGGCTTCC CGATGCTGAC CCGCACCTCC CAACTGGTGC GCGCCAACCC 14040 

14041 GGAGTCGGCG TCGTTCATCC GCATGGACGA CCCGCAGCAC TCCCGGCTGC GCTCGATGCT 14100 

14101 CACCCGGGAC TTCCTGGCCC GCCGCGCCGA GGCGCTGCGC CCCGCGGTGC GGGAGCTGCT 14160 

14161 GGACGAGATC CTGGGCGGGC TGGTGAAGGG GGAGCGGCCG GTCGACCTGG TCGCCGGACT 14220 

14221 GACGATCCCG GTGCCCTCGC GGGTCATCAC CCTGCTCTTC GGCGCCGGTG ACGACCGCCG 14280 

14281 GGAGTTCATC GAGGACCGCA GCGCGGTCCT CATCGACCGC GGCTACACCC CGGAGCAGGT 14340 

14341 CGCCAAGGCC CGGGACGAAC TCGACGGCTA TCTGCGGGAG CTGGTCGAGG AGCGGATCGA 14400 

14401 GAACCCGGGC ACCGACCTGA TCAGCCGGCT CGTCATCGAC CAGGTGCGGC CGGGGCATCT 14460 

14461 GCGGGTCGAG GAGATGGTCC CGATGTGCCG GCTGCTGCTG GTGGCCGGTC ACGGCACCAC 14520 
14521 CACCAGCCAG GCGAGCCTGA GCCTGCTCAG CCTGCTCACC GACCCGGAGC TGGCCGGGCG 14580 
14581 CCTCACCGAG GACCCGGCCC TGCTGCCCAA GGCGGTCGAG GAGCTGCTGC GCTTCCACTC 14640 
14641 CATCGTGCAG AACGGGCTGG CCCGTGCCGC GGTGGAGGAC GTCCAGCTCG ACGATGTGCT 14700 
14701 CATCCGGGCG GGCGAGGGCG TGGTGCTGTC GCTGTCGGCG GGCAACCGGG ACGAGACGGT 14760 
14761 CTTCCCCGAC CCGGACCGGG TGGACGTGGA CCGCGACGCC CGCCGCCATC TCGCCTTCGG 14820 
14821 CCACGGCATG CACCAGTGCC TGGGCCAGTG GCTGGCCCGG GTGGAGCTGG AGGAGATCCT 14880 
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FIGURE 

14881 CGCCGCGGTG CTGCGCTGGA TGCCCGGTGC 
14941 CTTCCGTCAT GAGGTGTCCA GTTACGGCCT 
15001 gtggagcggc tgaccgtcgt cctcgacgcg 
15061 gccacggccc ccgagatct 

I 10 I 20 I 30 



2-10 

CCGGCTCGCG GTGCCCTTCG AGGAGCTGGA 14940 

end of ORF 10--> 
CGGCGCCCTC CCGGTGACCT GGTGAgcggc 15000 

tcggcctgct gcgcgatggg gcgctgcgcg 15060 

15079 

! 40 I 50 I 60 



^DOCID:,<CA_ 



_2ioeii3Aij_>. ;■'>.'•: 



2108113 




CD 



c 
E 

CO 



a: 



CO 



J5 
o 
It 

u. 

OS 

o 



on 



8 

LU 

8 





2108113 



11.6 kb 



/A 



pcb A3 pcbC 



EcoR\ NcQl NcQl 

o— J — i — V- 



H 

Eco Rl 
_J 



x Nco\ 




2 kb 



FIGURE 4 



N§pOpJD-,<CA_ 



_2106113AVt>. 



2108113 




2108113 




FIGURE 6 
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I 10 
1 MTHSDNYGDD 
61 GQFDTFLGAW 
121 IPFTVPMTLP 
181 DANGNVLQAA 
241 GLKDTEGKPV 
301 GRYPPGSTFK 
361 HSCNTFFVNS 
421 RVEASPLVMA 
481 GDALRGLGGQ 
541 AAKFLSNLAA 
I 10 



I 20 
PPQGRRRSRG 
EKGDAPTAAG 
GAGEYAWDST 
SIIGAVDPRT 
KTTIDPRVQS 
WITAALLQQ 
RSKLSESSLK 
SIAATVKQGE 
PHAKTGTAEF 
GZ 

I 20 



I 30 
RAATAWAGL 
LTDTPDNAES 
AKWGGGKEW 
GKGSAGLQSR 
AAAAALEGSK 
GMNPEERADC 
QAAEAPGIGG 
FKQPVLVPDA 
GTEKPPKTHA 

I 30 



I 40 
AVTVGLGYWG 
LIKSVMTNLK 
KVAFNTEMIH 
YDKQLTGGSG 
KNAAIVAVDP 
PKFAHVNGQS 
TWDVGASTFD 
VKKPHQAPRM 
WMIGYQGDRN 

I 40 



.1 50 
YTSLVADEKD 
PTKTEITAKT 
PCMVPGCTLA 
AARSWILDR 
ATGNILAAAN 
FENQDQPTLP 
GSVPVSNSEN 
APGIVDSLRS 
IAWSVLLEDG 

I 50 



I 60 
SGDPEVEAAA 60 
GEKNPEGEVE 120 
LKSRERADIL 180 
ESGQVVKKLT 240 
VPSGMNRALE 300 
AGSTFRDSEA 360 
DKAASTIGQA 420 
MMRSTVTDGA 480 
GSGGAOAGPV 540 
552 

I 60 



FIGURE 10 
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I 10 

1 MSRVSTAPSG 
61 GVAADVLARI 
121 QCLDSVAIVA 
181 TTVPNPPANT 
241 ERLNIPVITT 
301 DLRPSMWQKG 
361 IEPLRARIAE 
421 ARADQPFGFL 
481 LPIVTVWNN 
541 AALRKGAELG 
I 10 



I 20 
KPTAAHALLS 
TGRPQACWAT 
FMSLYAVELQ 
PAKPVGWAD 
YIAKGVLFVG 
IEKKTVRISP 
F1ADPETYED 
TSAGCSSFGY 
DTNGLIELYQ 
RPFLIEVPVN 

I 20 



I 30 
RLRDHGVGKV 
LGPGMINLST 
RPHEITDLVD 
GWQKAADQAA 
HELNYGAVTG 
TVNPIPRVYR 
Q4RVHQVIDS 
GIPAAIGAQM 
NIGHHRSHDP 
YDFQPGGFGA 

I 30 



I 40 
FGWGREAAS 
GIATSVLDRS 
SAVNAAMTEP 
ALLAEAKHPV 
YMDGILNFPA 
PDVDWTDVL 
MNTVMEEAAE 
ARPDQPTFLI 
AVKFGGVDFV 
LSIZ 

I 40 



I 50 
ILFDEVDPID 
PVTALAAQSE 
VGPSFISLFV 
LWGAAAIRS 
LQflMFAPVDL 
AFVEHFETAT 
PGEGTIVSDI 
AGDGGFHSNS 
ALAEANGVDA 

I 50 



| 60 
FVLTRHEFTA 60 
SHDIFPNDTH 120 
DLLGSSEGID 180 
GAVPAIRALA 240 
VLTVGYDYAE 300 
ASFGAKQRHD 360 
GFFRHYGVLF 420 
SDLETIARLN 480 
TRATNREELL 540 
574 

I 60 



FIGURE 11 
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| 10 I 20 I 30 

1 MGAPVLPAAF GFLASARTGG GRAPGPVFAT 
61 AVARSLTGAP TTAVLAGEIY NRDELLSVLP 
121 FATWRTGDR VLLATDHAGS VPLYTCVAPG 
181 LTGVYQVPAG AVMDIDLGSG TAVTHRTWTP 
241 GDTPLWLSG GIDSSGVAAC AHRAAGELDT 
301 PTTELLAQLP YAVWASESVD PDIIEYLLPL 
361 DRLPALOTVL AHDMATFDGL NEMSPVLSTL 
421 REKWVLRAAM AEftLPAETVN RPKI/3VHEGS 
481 LFDLTVGGGR HPSEVDTDDV VRSVADRTAR 
I 10 I 20 1 30 



| 40 I 50 I 60 

RGSHTDIOTP CGERSLAATL VHAPSVAPDR 60 
AGPAPEGDAE LVLRLLERYD LHAFRLVNGR 120 
EVRASTEAKA LAAHRDPKGF PLAEARRVAG 180 
GLSRRILPEG EAVAAVRAAL EKAVAQRVTP 240 
VSMGTOTSNE FREARAWDH LRTRHREITI 300 
TALYRALDGP ERRILTGYGA DIPLGGMHRE 360 
AGHWTTHPW DREVLDLLVS LEAGLKRRHQ 420 
GTTSSFSRLL LDHGVAEDRV HEAKRQWRE 480 
GAAZ 514 

| 40 I SO | 60 
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| 10 I 20 I 30 

1 VERIDSHVSP RYAQIPTFMR LPHDPQPRGY 
61 GLIHGVGIDR GPGTFDLINC VDAGDINLTP 
121 HSLTVAALRA VAEQHGPLAV VHLDAHSDTN 
181 IGIRGHNPKP DSLDYARGHG VRWTADEFG 
241 PAFAPGTGTP APGGLLSREV LALLRCVGDL 
301 LLYQYARAHR TQLZ 

I 10 I 20 I 30 



| 40 | 50 I 60 

DVWIGAPYD GGTSYRPGAR FGPQMRSES 60 
FDMNIAIDTA QSHLSGLLKA NAAFLMIGGD 120 
PAFYGGRYHH GTPFRHGIDE KLIDPAAMVQ 180 
ELGVGGTADL IREKVGQRPV YVSVDIDWD 240 
KPVGFDVMEV SPLYDHGGIT SILATEIGAE 300 

314 

| 40 I 50 | 60 



FIGURE 13 
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I 10 I 20 I 30 

1 MASPIVDCTP YRDELLALAS ELPEVPRADL 
61 EDGYLLLRGL PVDDSELPET PTSTPAPLDR 
121 YHDVYPSPGA HYLSSETSET LLEFHTEMAY 
181 ALPLLDEKTR ARLFDRKVPC CVDVAFRGGV 
241 EDPADKEAVA HLSQALDDVT VGVKLVPGDV 
301 IRTDRNGQLS GGERAGOTIS FSPRRZ 

| 10 I 20 I 30 



I 40 I 50 I 60 

HGFLDEAKTL AAJRLPEGLAA ALDTFNAVGS 60 
KRLVMEAMLA LAGRRLGLHT GYQELRSGTV 120 
HILQPNYVML ACSRADHENR AETLVGSVRK 180 
DDPGAIANVK PLYGDANDPF LGYDRELLAP 240 
LIIDNFRTTH ARTPFSPRWD GKDRWLKRVY 300 

326 

I 40 I 50 I 60 



FIGURE 14 
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1 10 
1 MSDSTPKTPR 
61 VADGQARGW 
121 ESIREHLKTL 
181 TLLTFFATDA 
241 EEALHTAALA 
301 PNWGRVAMAI 
361 VIGIDLAIAD 
i 10 



I 20 
GFWHTAPVG 
VLARNANVAT 
EWPAGEGGFD 
RLDPAECjDRL 
LVKDIASDGE 
GKCSDDTDID 
GAFTVYGCDL 

I 20 



I 30 
LADDGRHDFT 
GLEGEENARE 
RAARAIMTTD 
FRRVMDRTFN 
GAAKLIEVQV 
QERVTIRFGE 
TEGYVRLNSE 

I 30 



1 40 
VLASTAPATV 
VREAVARALG 
TRPKEVRVSV 
AVSIDTDTST 
TGARDDAQAK 
VEVYPPKARG 
YTTZ 

I 40 



I 50 
SAVFTRSRFA 
LPEGEMLIAS 
GGATLVGIAK 
SDTAVLFANG 
RVGKTWNSP 
DQADDALRAA 

I 50 



I 60 
GPSWLCREA 60 
TGVIGRQYPM 120 
GVGMLEPDMA 180 
LAGEVDAGEF 240 
LVKTAVHGCD 300 
VAEHLRGDEV 360 
394 

I 60 



FIGURE 15 
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I 10 
1 METTRSTTAD 
61 RTLVTFETAP 
121 ARSNYGTOVL 
181 LLATMPSTTP 
241 RASRIEVHLG 
301 WIYCLSSRIA 
361 PVGPEGTGDL 
421 DFPSGDYFDR 
481 LDDPEINALL 
541 TGSFGMYDYV 
I 10 



I 20 
EGFDAGVRGV 
GKAGQRLVPD 
GAGPTYFRHL 
VPRDROTGAE 
KDPHEVDRML 
PFDNVHCRRA 
EAARAELKLA 
YGGCPEYLRE 
DEGAQCADPA 
ALGAKZ 

I 20 



I 30 
VAPTDAPGGT 
LAESLGESSE 
LGTEYGGPWR 
YRLRPVATGP 
LAGEAHVDLA 
VQFATDKAAM 
GMPDGFRTOI 
HGIGIIMFGW 
RRAEIWHRID 

I 30 



1 40 
LRLVRTDDFD 
DGRVWTYRLR 
EPDADGPVTL 
YRIVSYTRGE 
GFGVQPAAQE 
QEAYGGAVGG 
AARKDRLKEY 
GADFPDGYGF 
QL1MDHAVTV 

I 40 



I 50 
SLDPGNTYYA 
EGLRYEDGTP 
ETPDERTLVF 
LAVLEPNPHW 
RILAEPELRA 
DIATTLLPPT 
RAAEALAAGL 
LQQITDGRAI 
PYLYPRSLLY 



I 



50 



I 60 
Y1WNFLRLIG 60 
WSADIKHAI 120 
RLREPFAGMD 180 
DPETOPVRVQ 240 
HADNPLTGFT 300 
LDGYKHFDRY 360 
ARVGIEAEVL 420 
KERGNQNMGE 480 
RHPOTRNAFV 540 
556 

I 60 



FIGURE 16 
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| 10 I 20 I 30 

1 MEVARRTCVR HGTVERRLDR LDRIVGLPLT 
61 ARTHIFGHGS EAVDAPEVLS LVSTEPLLDE 
121 QVDAAYWSL QSPRHSLERS VRTCEVLDDP 
181 TGPGSEILVT RVPQLAGLTA PTRLHITGAS 
241 SLAERPRRTT SLLVDPTIVP RALAGRLAAL 
301 GADARRPGAG PDQGSVPGQA EGRKLDVDDL 
361 HRLEQSLGAR LLLRSPRGTS LTGPTRQFLR 
421 RRGVAAGARM SGZ 

| 10 I 20 I 30 



I 40 I 50 I 60 

LRSRHTARLT TAGSRXLVAG RRFFHQVDIA 60 
WEDAAASLD LLLSVRHEAP HQVAAQLAGY 120 
LWVILPRDHP LAARREVSLA DLRDETWVSE 180 
VARGILRRGD AIGLGSPTHP AVQDPSLVRR 240 
IAEVQLRRFA EHHRDLLDEP WWAQWYAERT 300 
HLLQAVARHG SINRAAAVLS ISQSALTRRI 360 
QLALYEAEFR EAAIACRSVE RPLAQGHWPI 420 

433 

I 40 | 50 I 60 



FIGURE 17 



2108113 



| 10 I 20 I 30 

MPSALQGKVA LITGASSGIG EATARALAAE 
OJ> LELDVADROG VEAAVASTVE ALGGLDILVN 
121 YMTRAALPHL LRSKGTWQM SSIAGRVNVR 
181 WIEPGTTDT ELRGHITHTA TKEMYBQRIS 
IRPTDQVZ 

| 10 I 20 t 30 



1 
61 



241 



I 40 | 50 | 60 

GAAVAIAARR VEKLRALGDE LTAAGAKVHV 60 
NAGIMLLGPV EDAOTTDWTR MICONLLGI24 120 
NAAVYQATKF GVNAFSETLR QEVTERGVRV 180 
OIRKLQAQDI AEAVRYAVTA PHHATVHEIF 240 

248 

| 40 I 50 I 60 



FIGURE 18 
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I 10 

1 MMNEAAPQSD 
61 AVLGDRRFTA 
121 EALRPAVREL 
181 LIDRGYTPEQ 
241 RLLLVAGHGT 
301 AVEDVQLDDV 
361 WLARVELEEI 
I 10 



I 20 
QVAPAYFMHR 
VTSAPGFPML 
LDEILGGLVK 
VAKARDELDG 
TTSQASLSLL 
LIRAGEGWL 
LAAVLRWMPG 

I 20 



I 30 
VCPVDPPPQL 
TRTSQLVRAN 
GERPVDLVAG 
YLRELVEERI 
SLLTDPELAG 
SLSAGNRDET 
ARLAVPFEEL 

I 30 



I 40 
AGLRSQKAAS 
PESASFIRMD 
LTIPVPSRVI 
ENPGTDLISR 
RLTEDPALLP 
VFPDPDRVDV 
DFRHEVSSYG 

I 40 



I 50 
RVTLWDGSQV 
DPQHSRLRSM 
TLLFGAGDDR 
LVIDQVRPGH 
KAVEELLRFH 
DRDARRHLAF 
LGALPVTWZ 

I 50 



| 60 
WLVTSHAGAR 60 
LTRDFLARRA 120 
REFIEDRSAV 180 
LRVEEMVPMC 240 
SIVQNGLARA 300 
GHGMHQCLGQ 360 
409 
I 60 v 



FIGURE 19 
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